A Text Mining Model by Using Weighting Technology

نویسندگان

  • Chenggen Shi
  • Jie Lu
چکیده

In Latent Semantic Indexing (LSI) has been proven to be a valuable analysis tool with a wide range of applications. However choosing an appropriate number of dimensions for LSI is still a crucial challenge. This paper provides a document vector model, by using weighting technology, to deal with this problem. Our experimental results have demonstrated that this model can detect a dataset structure, help determine an appropriate number of dimensions for LSI.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier

With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...

متن کامل

Arabic Text Classification Using Decision Trees

1 Text mining draw more and more attention recently, it has been applied on different domains including web mining, opinion mining, and sentiment analysis. Text pre-processing is an important stage in text mining. The major obstacle in text mining is the very high dimensionality and the large size of text data. Natural language processing and morphological tools can be employed to reduce dimens...

متن کامل

Mining of Emerging trends of Covid-19 thematic areas in National and International publications

Background &Aim: The results from the analysis of COVID-19 literature by employing text-mining techniques are of particular importance for researchers, policymakers, and planners of medical sciences at the national and international levels, avoiding parallel research and waste of time and budget. The paper explore emerging topics and the trend of scientific words at the national and internation...

متن کامل

A new term-weighting scheme for text classification using the odds of positive and negative class probabilities

Text classification is a core technique for text mining and information retrieval. It has been applied to many applications in many different research and industrial areas. Term weighting schemes have to assign an appropriate weight to each term to obtain a high text classification performance. Although term weighting is one of the important modules for text classification, and text classificat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004